[PoC] TPU receiver prometheus metrics #76

Szymongib · 2022-09-28T08:33:10Z

Summary of Changes

This is proof of concept on how we could get some other metrics to Prometheus.

This approach works and does not change the behaviour of existing solana metrics, but have some downsides, mainly:

We need to pipe some struct to all relevant places when stats are not accessible outside the module.
Quite a bit of boilerplate is required create additional _total variables and update them correctly (perhaps we could do some of that with proc macros, building on @enriquefynn idea).

This is a draft so we can discuss this or other possible approaches.

enriquefynn · 2022-09-28T14:20:21Z

streamer/src/streamer.rs

+        let max_channel_len = self.packets_count.swap(0, Ordering::Relaxed);
+
+        {
+            let mut stats = self.total_stats.lock().unwrap();


Nice! We accumulate these values now 👌

What are packet_batches_count, full_packet_batches_count and max_channel_len?

So this is all kind of messy, but from what I understand:

packets_count here is just number of UDP packets received specifically on TPU sockets.

packet_batches_count is number of "batches" received - Solana will read packets from the socket in batches of up to 64 packets.

full_packet_batches_count - is number of batches that were full (so containing 64 packets).

max_channel_len - after reading from the socket packets are forwarded to unbounded channel. So this value is is max length of this channel (across all current measurements), measured each time before forwarding new packets. So this will basically be a max value over 1 second since then the stats are reset 🤷

Can you add those as the HELP line in the exported metrics?

So this will basically be a max value over 1 second since then the stats are reset

I don’t know how it measures this, but if the channel is backed by something similar to a Vec (I think it might be a deque in practice, but the same applies), and the max measures the capacity of the vec/deque, then it would persist even if the metrics are reset, unless they shrink_to_fit. But this is pure speculation, I haven’t looked into how it is implemented.

enriquefynn · 2022-09-28T14:22:56Z

core/src/validator.rs

@@ -379,6 +380,7 @@ impl Validator {
        socket_addr_space: SocketAddrSpace,
        use_quic: bool,
        tpu_connection_pool_size: usize,
+        prometheus_collector: Option<PrometheusCollector>,


Although it makes sense to put new arguments at the end of the function parameters, to avoid conflicts, it's better to put new arguments in the middle of the other arguments, that makes rebasing easier if something changes in the arguments.

ruuda

This looks pretty clean and not very invasive, which is a good thing 👍 I wonder though what would happen if we turn it around, instead of making StreamerReceiveStats have a references to metrics and push to it, could the Prometheus config have a reference to StreamerReceiveStats and read from it? I’m can’t predict what upstream Solana people will say, but so far it looks like they are very concerned about this slowing down the validator. (Though a single branch is negligible either way compared to all the string formatting for the logs ...) I genuinely don’t know how it would turn out, but it may worth trying to see which approach is cleaner.

Also it would be nice if we can avoid the polling/pushing delay, but I don’t know if if that will make it much more complex. Hooking the report function is definitely tempting because it makes it a small and self-contained change.

ruuda · 2022-09-29T10:52:01Z

core/src/fetch_stage.rs

+                    let stats = {
+                        tpu_stats.total_stats.lock().unwrap().clone()
+                    };
+                    collector.lock().unwrap().save_tpu_receiver_stats(stats)


If I understand correctly, this will keep one copy of the metrics in with the Prometheus config, and periodically “push” to it, and then the endpoint serves whatever the latest value is that it had.

This creates a polling delay, the value that we serve on /metrics can be outdated by the cycle interval of this loop. It looks like that internal is 1 second, so it’s not too bad, but it would be nicer if we can get the actual current values. If we can get the StreamerReceiveStats into the Prometheus endpoint, it could do the atomic reads there, but it means we would need to make them totals. We can handle the resetting here at this point then.

Something like this:

struct GoatCounter<T> { num_goats_teleported: T, num_goats_received: T, } type GoatCounterCurrent = GoatCounter<AtomicU64>; type GoatCounterSnapshot = GoatCounter<u64>; impl GoatCounterCurrent { pub fn snapshot(&self) -> GoatCounterSnapshot { GoatCounterSnapshot { num_goats_teleported: self.num_goats_teleported.load(Ordering::Relaxed), num_goats_received: self.num_goats_received.load(Ordering::Relaxed), } } } impl std::ops::Sub for GoatCounterSnapshot { fn sub(self, rhs: Self) -> Self { Self { num_goats_teleported: self.num_goats_teleported - rhs.num_goats_teleported, num_goats_received: self.num_goats_received - rhs.num_goats_received, } } } // In the place where we print the metrics: let mut last_logged: GoatCounterSnapshot = todo!(); let current = goat_counter.snapshot(); log(current - last_logged); last_logged = current; // In the Prometheus endpoint: let current = goat_counter.snapshot(); write_metrics(&current, ...);

Not sure if this is feasible in practice though, it looks like there is something going on with accumulation over multiple threads? Also, having the separate fields as atomic ints can lead to surprising results. For example, if you are trying to compute an error rate, and you have counters num_requests and num_errors, where it should be the case that num_errors <= num_requests, then if we load those counters with Ordering::Relaxed, that inequality could be violated. It requires some thought both at the place where the counters are read and the place where they are incremented to get it right, I’m not sure Solana does this, since they prioritize speed over correctness, and the Ordering::Relaxed does give the compiler and CPU more freedom to optimize than some of the other orderings ...

Hmm, I see, so that would basically result in changing how the logic of current Solana metrics are implemented. I would be a bit hesitant to change this Solana metrics logic because to make it consistent we would likely need to do it in all places this pattern is followed (which seems to be a lot).

Not sure if this is feasible in practice though, it looks like there is something going on with accumulation over multiple threads?

Yes, that is what is happening.

Also, having the separate fields as atomic ints can lead to surprising results. For example, if you are trying to compute an error rate, and you have counters num_requests and num_errors

Yeah there are definitely possible inconsistencies, but I guess that is really common in metrics. Using the example of num_requests and num_errors in some other application, num_requests is likely to be incremented at the beginning of handling the request while num_errors for obvious reasons can only be incremented later, so it won't be atomic operation.

P.S. I now want to see this whole application for teleporting goats 🤣

num_requests is likely to be incremented at the beginning of handling the request while num_errors for obvious reasons can only be incremented later, so it won't be atomic operation.

That by itself is not an issue, since you still couldn’t get a >100% error rate. If you increment the error counter before the request counter, then you could get a >100% error rate, which is really counter-intuitive. (Or if you read the request counter before the error counter.)

I now want to see this whole application for teleporting goats

You might enjoy these :-)
https://crbug.com/31482
https://codereview.chromium.org/543045

ruuda · 2023-01-26T08:14:23Z

@Szymongib do you still plan to pick this up some time? It’s not high prio, but I think it would be nice to have.

Szymongib · 2023-01-26T09:25:22Z

@Szymongib do you still plan to pick this up some time?

Hmm, I thought we do not really want to pursue it further, but if you think it is worth it I will add it back to my todolist.

ci: ignore mio audit

Proof of concept for TPU receiver metrics

d899971

Szymongib requested review from enriquefynn and ruuda September 28, 2022 08:33

enriquefynn reviewed Sep 28, 2022

View reviewed changes

ruuda reviewed Sep 29, 2022

View reviewed changes

Fix gathering counts

1eaf91d

hritique pushed a commit that referenced this pull request Apr 29, 2024

v1.17: ignore mio audit report (#76)

527b75b

ci: ignore mio audit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC] TPU receiver prometheus metrics #76

[PoC] TPU receiver prometheus metrics #76

Szymongib commented Sep 28, 2022

enriquefynn Sep 28, 2022

enriquefynn Sep 28, 2022

Szymongib Sep 30, 2022

ruuda Sep 30, 2022

ruuda Sep 30, 2022

enriquefynn Sep 28, 2022

ruuda left a comment

ruuda Sep 29, 2022

Szymongib Sep 30, 2022

ruuda Sep 30, 2022

ruuda commented Jan 26, 2023

Szymongib commented Jan 26, 2023

[PoC] TPU receiver prometheus metrics #76

Are you sure you want to change the base?

[PoC] TPU receiver prometheus metrics #76

Conversation

Szymongib commented Sep 28, 2022

Summary of Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruuda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruuda commented Jan 26, 2023

Szymongib commented Jan 26, 2023